Exploring the naturalness of several German high-quality-text-to-speech systems

نویسندگان

  • Hansjörg Mixdorff
  • Dieter Mehnert
چکیده

The synthesis of near-to-natural F0 contours is an important issue in text-to-speech and crucial to the naturalness and intelligibility of synthetic speech. In earlier studies of the first author a model of German intonation was developed that is based on the quantitative Fujisaki-model. The current paper addresses a perception experiment comparing a TTS-system incorporating this new approach with several German TTS-systems with high segmental quality. Natural speech samples and a synthesis version with natural segment durations were used as references. Results show, that the natural speech samples unanimously received 10 points on a 0 to 10 point scale. The best TTS-systems cluster around a mean value of 5.0, whereas the variant with natural durations reached a mean score of 6.6 points, indicating the importance of closely modeling natural segment durations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مراحل و نحوه ی تهیه ی دادگان های صوتی هجایی و دایفونی برای سامانه ی تبدیل متن به گفتار فارسی

Abstract Speech databases are part of the concatenative text to speech synthesis systems. Phonetic quality of the databases plays a significant role in the naturalness of the synthesized speech. This paper introduces two syllable and diphone speech databases for Persian and investigates the way of their development and their specifications and their advantages to each other. ...

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

On-line experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference

Three experiments are reported that use new experimental methods for the evaluation of text-to-speech (TTS) synthesis from the user’s perspective. Experiment 1, using sentence stimuli, and Experiment 2, using discrete ‘‘call centre’’ word stimuli, investigated the effect of voice gender and signal quality on the intelligibility of three concatenative TTS synthesis systems. Accuracy and search t...

متن کامل

Perceptual Quality Dimensions of Text-to-Speech Systems

The aim of this paper is to analyze the perceptual quality dimensions of state-of-the-art text-to-speech systems (TTS). Therefore, several pretests were conducted to determine a suitable set of attribute scales. The resulting 16 scales were used in a semantic differential on a diverse database containing 16 different TTS systems. A subsequent multidimensional analysis (Principal Axis Factor ana...

متن کامل

Prediction of Perceived Sound Quality of Synthetic Speech

This paper investigates the performance of objective speech and audio quality measures for the prediction of perceived sound quality of synthetic speech. A number of existing quality measures have been applied to synthetic speech generated by different speech synthesizers such like LP synthesizer, HSM synthesizer, STRAIGHT synthesizer and several HMM based text-to-speech synthesis systems. The ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999